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ABSTRACT 

The  Stream  Control  Transmission  Protocol  (SCTP)  allows 
for  end-to-end  load  balancing  in  FCS  networks  to  be  per¬ 
formed  at  the  transport  layer,  through  repeated  changeover. 
We  present  a  problem  in  the  current  SCTP  (RFC2960)  specifi¬ 
cation  that  results  in  unnecessary  retransmissions  and  “ TCP- 
unfriendly ”  growth  of  the  sender’s  congestion  window  during 
certain  changeover  conditions.  We  first  illustrate  the  problem 
using  an  example  scenario.  We  then  briefly  describe  the  pro¬ 
posed  solutions  for  problem  and  our  future  direction. 

1  INTRODUCTION 

A  node  is  multihomed  if  it  can  be  addressed  by  multiple  IP 
addresses  [3],  as  would  be  the  case  when  the  host  has  multiple 
network  interfaces.  Network  layer  redundancy  allows  access 
to  a  host  even  if  one  of  its  IP  addresses  becomes  unreachable; 
ideally  packets  can  be  rerouted  to  one  of  the  host’s  alternate 
IP  addresses.  However,  since  IP  is  connectionless,  end-to-end 
session  persistence  under  failure  conditions  becomes  the  re¬ 
sponsibility  of  the  transport  layer  and  above.  To  provide  for 
such  fault  tolerance,  the  Stream  Control  Transmission  Proto¬ 
col  (SCTP)  supports  multihoming  at  the  transport  layer.  SCTP 
sessions,  or  associations,  can  dynamically  span  over  multiple 
local  and  peer  IP  addresses  so  that  an  association  can  remain 
alive  even  if  one  of  the  endpoints’  addresses  becomes  unreach¬ 
able.  Multihoming  also  allows  for  end-to-end  load  balancing 
to  be  performed  at  the  transport  layer.  Bearing  in  mind  that  all 
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resources  available  should  be  optimally  used  in  FCS  networks, 
our  investigation  focusses  on  utilizing  all  network  resources 
visible  at  the  transport  layer. 

SCTP  [9]  is  a  recent  standards  track  transport  layer  protocol  in 
the  Internet  Engineering  Task  Force  (IETF).  Of  the  salient  fea¬ 
tures  that  distinguish  SCTP  from  TCP,  we  concern  ourselves 
with  multihoming.  SCTP  multihoming  allows  binding  of  one 
transport  layer  association  to  multiple  IP  addresses.  This  bind¬ 
ing  allows  an  SCTP  sender  to  send  data  to  a  multihomed  re¬ 
ceiver  through  different  destination  addresses.  For  instance,  in 
figure  1,  A  could  send  data  to  B  using  destination  address  B\ 
or  B'2-  SCTP’s  multihoming  feature  was  motivated  by  fault 
tolerance;  if  one  destination  address  becomes  unreachable, 
the  destination  can  still  send  and  receive  via  other  interfaces 
bound  to  the  association. 

In  a  multihomed  SCTP  association,  the  sender  transmits  data 
to  its  peer’s  primary  destination  address.  SCTP  provides  for 
application-initiated  changeovers  so  that  the  sending  applica¬ 
tion  can  change  the  sender’s  primary  destination  address,  thus 
moving  the  outgoing  traffic  to  a  potentially  different  path 1 . 
This  feature  can  be  used  to  perform  end-to-end  load  balanc¬ 
ing  using  SCTP,  thus  helping  FCS  network  elements  exploit 
all  network  resources  visible  at  the  transport  layer.  We  un¬ 
covered  a  problem  in  the  current  SCTP  (RFC2960)  specifica¬ 
tion  [9]  that  results  in  unnecessary  retransmissions  and  “TCP- 
unfriendly”  growth  of  the  sender’s  congestion  window  under 
certain  changeover  conditions. 

In  section  2,  we  present  a  specific  example  which  illus¬ 
trates  the  problem  of  cwnd  overgrowth  with  SCTP’s  currently 


^CTP  was  designed  as  a  transport  protocol  for  telephony  signaling  in 
SS7  networks.  In  an  SS7  network  the  upper  layers  can  dictate  to  which 
destination  address  packets  will  be  sent,  motivating  the  application-initiated 
changeover  feature  in  SCTR 
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specified  handling  of  changeover.  Section  3  presents  two 
changeover  aware  congestion  control  algorithms:  Consetya- 
tive  CACC  (C-CACC)  and  Split  Fast  Retransmit  CACC  (SFR- 
CACC),  and  the  Rhein  algorithm.  In  light  of  the  bigger  prob¬ 
lem  of  end-to-end  load  balancing,  we  conclude  with  questions 
which  describe  our  future  direction  in  section  4. 


2  CONGESTION  WINDOW  OVERGROWTH 


In  this  section,  we  present  an  example  illustrating  the  oc¬ 
curence  of  cwnd  mishandling  and  unnecessary  retransmissions 
with  SCTP’s  currently  specified  handling  of  changeover.  The 
example  uses  the  architecture  shown  in  figure  1.  Endpoints  A 
and  B  have  an  SCTP  association  between  them.  Both  end¬ 
points  are  multihomed,  A  with  network  interfaces  A  i  and  yl2, 
and  B  with  interfaces  B\  and  Z?22.  All  four  addresses  are 
bound  to  the  one  SCTP  association.  For  several  possible  rea¬ 
sons  (e.g.,  path  diversity,  policy  based  routing,  load  balanc¬ 
ing),  we  assume  in  this  example  that  the  data  traffic  from  A 
to  B\  is  locally  routed  through  A\,  and  from  A  to  _B2  through 
yl2.  Non  overlapping  paths  are  assumed  with  the  bottleneck 
bandwidth  of  path  1  being  10Mbps,  and  that  of  path  2  being 
100Mbps.  The  propagation  delay  of  both  paths  is  200ms,  and 
the  path  MTU  for  both  paths  is  1250  bytes. 

Figure  2  shows  a  timeline  of  events  for  our  example.  The  verti¬ 
cal  lines  represent  interfaces  B\,  A\,  A 2  and  Z?2 .  The  numbers 
along  the  lines  represent  times  in  milliseconds.  Each  arrow 
depicts  the  departure  of  a  packet  from  one  interface  and  its 
arrival  at  the  destination.  The  labels  on  the  arrows  are  either 
SCTP  Transmission  Sequence  Numbers  (TSN)  or  labels  of  the 
form  STc{Tqs  ~  Tqe)-  Assuming  one  chunk  per  packet,  ev¬ 
ery  packet  in  the  example  corresponds  to  one  TSN.  A  number 
represents  the  TSN  of  the  chunk  in  the  packet  being  transmit¬ 
ted.  A  label  STq(Tqs  ~  Tge )  represents  a  packet  carrying  a 
SACK  chunk  with  cumulative  ack  Tc,  and  gap  ack  for  TSNs 
Tqs  through  Tqe-  C\  is  the  cwnd  at  A  for  destination  B and 
C-2  is  the  cwnd  at  A  for  destination  _C2.  C\  and  C2  are  denoted 
in  terms  of  MTUs,  not  bytes. 

2More  precisely,  A 1,  A2,  B\  and  B2  are  IP  addresses  associated  with 
link  layer  interfaces.  Here  we  assume  only  one  address  per  interface,  so 
address  and  interface  are  used  interchangeably. 


The  assignments  such  as  initial  TSN  =  1  and  initial  time  t  =  0 
are  arbitrary  assignments  to  signify  the  beginning  of  the  snap¬ 
shot.  These  assignments  are  not  meant  to  imply  the  beginning 
of  the  association.  Initially,  C2  =  2  because  we  assume  that 
either  there  has  been  no  transmission  to  _B2  before  t  —  0  dur¬ 
ing  the  lifetime  of  the  association,  or  C2  has  decayed3  to  two 
MTUs  by  t  —  0. 

2.1  Example  Description 

The  sender  {A)  initially  sends  to  the  receiver  (B)  using  pri¬ 
mary  destination  address  B\.  This  setting  causes  packets  to 
leave  through  A\.  Assume  these  packets  leave  the  trans¬ 
port/network  layers,  and  get  buffered  at  rf’s  link  layer  A\, 
whereupon  they  get  transmitted  according  to  the  channel’s 
availability.  This  initial  condition  is  depicted  in  figure  2  at 
time  t  =  0,  when  in  this  example  A  has  50  packets  buffered 
on  interface  A\. 

At  t  —  1,  as  TSNs  1-50  are  being  transmitted  through  A\, 
the  sender’s  application  changes  the  primary  destination  to  Z?2, 
thus  requiring  any  new  data  from  A  to  be  sent  to  _B2.  In  the 
example,  we  assume  TSN  51  is  transmitted  to  the  new  primary 
at  t  —  1.  We  refer  to  this  moment  as  the  changeover  time.  This 
new  primary  destination  causes  new  TSNs  to  leave  the  sender 
through  yl2.  Concurrently,  the  packets  buffered  earlier  at  A\ 
are  still  being  transmitted.  Previous  packets  sent  through  A\, 
and  the  packets  sent  through  rf2,  can  arrive  at  the  receiver  B 
in  an  interleaved  fashion  on  interfaces  B\  and  Z42  respectively. 
In  figure  2,  TSNs  1,  51,  52  and  2  arrive  at  times  21,  21.1,  21.2, 
22,  respectively.  This  reordering  is  introduced  as  a  result  of 
changeover;  the  specific  times  depend  on  the  delays  of  paths  1 
and  2. 

The  receiver  starts  reporting  gaps  as  soon  as  it  notices  reorder¬ 
ing.  If  the  receiver  communicates  four  missing  reports  to  the 
sender  before  all  of  the  original  transmissions  (TSNs  1  -  50) 
have  been  acked,  the  sender  will  start  retransmitting  the  un- 
acked  TSNs.  SCTP’s  Fast  Retransmit  algorithm  is  based  on 
TCP’s  Fast  Retransmit  algorithm  [4],  with  the  additional  use 
of  selective  acks  and  a  modification  to  handle  some  cases  of 
reordering4.  Accordingly,  the  SACKs  resulting  from  the  re¬ 
ceipt  of  TSNs  51-54  will  be  the  only  ones  generating  missing 
reports.  The  SACKs  received  by  A  on  A  2  at  t  —  41.1  and 
t  =  41.2  will  be  considered  as  the  first  and  second  missing 

3The  cwnd  for  a  destination  address  decays  exponentially  if  no  data  is 
transmitted  to  that  destination  address  [9]. 

4  [7]  goes  hand-in-hand  with  RFC2960.  The  Implementor’s  guide  main¬ 
tains  all  changes  and  additions  to  be  included  in  RFC2960's  next  version. 
All  implementations  are  expected  to  carry  the  specilications  and  modifica¬ 
tions  in  this  guide. 
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reports  for  TSNs  2  -  50.  Since  these  SACKs  do  not  carry  new 
cumulative  acks,  they  do  not  cause  growth  of  GV  Between 
t  =  42  and  t  =  81,  the  cumulative  ack  in  the  SACKs  received 
by  A  on  A\  increases  as  a  consequence  of  the  original  trans¬ 
missions  to  destination  B\  reaching  B.  In  this  period,  A  re¬ 
ceives  40  SACKs  which  incrementally  carry  cumulative  acks 
of  2  -  41. 

The  SACKs  received  by  A  on  A-2  at  t  =  81.2  and  t  =  81.3 
carry  a  cumulative  ack  of  41,  and  are  the  third  and  the  fourth 
missing  reports  for  TSNs  42  -  50.  Upon  the  fourth  missing 
report,  A  retransmits  only  TSN  42,  since  C2  permits  only  one 
more  packet  to  be  outstanding.  Note  that  this  falsely  triggered 
retransmission  leads  to  an  unnecessary  reduction  of  the  C\  by 
half,  since  the  sender  infers  congestion  from  Aj  to  B±.  At 
t  =  82,  the  SACK  for  the  original  transmission  of  TSN  42 
reaches  Aon  A\.  Since  the  sender  cannot  distinguish  between 
SACKs  generated  by  transmissions  from  SACKs  generated  by 
retransmissions,  this  SACK  incorrectly  acks  the  retransmis¬ 
sion  of  TSN  42,  thereby  increasing  C2  by  one,  reducing  the 
amount  of  data  outstanding  on  destination  B2,  and  triggering 
the  retransmission  of  TSNs  43  and  44.  At  t  =  83,  the  SACK 
for  the  original  transmission  of  TSN  43  arrives  at  4  on  ij. 
As  before,  this  SACK  acks  the  retransmission  (of  TSN  43), 
further  incorrectly  increasing  C2,  and  triggering  retransmis¬ 
sion  of  TSNs  45  and  46.  This  behaviour  of  SACKs  for  origi¬ 
nal  transmissions  incorrectly  acking  retransmissions  continues 


until  the  SACKs  of  all  the  original  transmissions  to  B\  (up  to 
TSN  50)  arc  received  by  A.  Thus,  the  SACKs  from  the  origi¬ 
nal  transmissions  cause  C2  to  grow  (possibly  drastically)  from 
wrong  interpretation  of  the  feedback. 

2.2  Discussion 

While  the  values  chosen  in  our  example  illustrate  but  a  sin¬ 
gle  case  of  the  congestion  window  overgrowth  problem,  our 
preliminary  investigation  shows  that  the  problem  occurs  for  a 
range  of  {propagation  delay,  bandwidth,  MTU}  settings.  For 
example,  with  both  paths  having  RTTs  of  200ms  (bandwidth  = 
100Kbps,  propagation  delay  =  40ms)  and  MTU  =  1500  bytes, 
the  incorrect  retransmission  starts  much  earlier  (at  TSN  3),  and 
the  cwnd  overgrowth  is  even  more  dramatic. 

The  congestion  window  overgrowth  problem  exists  even  if  the 
buffering  occurs  not  at  the  sender’s  link  layer,  but  in  a  router 
along  the  path  (in  figure  1,  path  1).  In  essence,  the  trans¬ 
port  layers  at  the  endpoints  can  be  thought  of  as  the  send¬ 
ing  and  receiving  entities,  and  the  buffering  could  potentially 
be  distributed  anywhere  along  the  end-to-end  path.  Further, 
the  reduction  in  C\  causes  the  sender  to  reduce  its  sending 
rate  on  path  1  unnecessarily.  In  our  preliminary  investigation 
of  load  balancing,  we  have  observed  multiple  occurrences  of 
such  false  retransmissions.  Such  false  retransmissions  cause 
the  sending  rate  to  reduce  drastically  on  path  1,  resulting  in 
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suboptimal  utilization  of  the  path. 


4  ANALYSIS  AND  FUTURE  WORK 


3  PROPOSED  SOLUTIONS 

The  TCP-unfriendly  cwnd  growth  and  incorrect  retransmis¬ 
sions  during  changeover  occur  due  to  a  current  inadequacy  of 
SCTP  -  either  (i)  the  sender  is  unable  to  distinguish  SACKs 
for  transmissions  from  SACKs  for  retransmissions,  or  (ii)  the 
sender’s  congestion  control  mechanism  is  unaware  of  the  oc¬ 
currence  of  a  changeover,  and  hence  is  unable  to  identify  re¬ 
ordering  introduced  due  to  changeover.  Addressing  either  of 
these  inadequacies  will  solve  the  problems  of  TCP-unfriendly 
cwnd  growth  and  unnecessary  retransmissions. 

The  Rhein  Algorithm  [6]  solves  the  problems  by  addressing  (i). 
The  Rhein  algorithm  is  based  on  the  Eifel  algorithm,  which 
uses  meta  information  in  the  TCP  header.  This  meta  infor¬ 
mation  is  used  in  disambiguating  acks  for  transmissions  from 
acks  for  retransmissions  to  improve  the  throughput  of  a  TCP 
connection.  The  Rhein  algorithm  uses  meta  information  in  the 
SCTP  header  to  curb  the  unnecessary  cwnd  growth  and  reduc¬ 
tion  due  to  spurious  retransmissions.  In  our  initial  conception 
of  the  Rhein  algorithm,  each  data  packet  has  to  carry  an  extra 
Retransmission  Identifier  (RTID)  Chunk,  and  each  SACK  has 
to  carry  an  extra  RTID  Echo.  Additional  complexity  is  also 
introduced  at  the  sender  and  receiver  for  processing  these  new 
chunks. 

The  Changeover  aware  congestion  control  (CACC)  algo¬ 
rithms  solve  the  problem  by  addressing  (ii).  In  other  words, 
the  CACC  solutions  introduce  changeover  awareness  in  the 
sender’s  congestion  control  mechanism.  The  cwnd  overgrowth 
occurs  due  to  the  sender  misinterpreting  SACK  feedback, 
and  incorrectly  sending  fast  retransmissions.  CACC  algo¬ 
rithms  curb  the  cwnd  miscalculations  by  eliminating  these 
improper  fast  retransmissions.  The  key  in  a  CACC  algo¬ 
rithm  is  maintaining  state  at  the  sender  for  each  destination 
when  changeover  happens.  On  receipt  of  a  SACK,  the  sender 
selectively  increases  the  missing  report  count  for  TSNs  in 
the  retransmission  list,  thus  preventing  incorrect  fast  retrans¬ 
missions.  [5]  describes  two  CACC  algorithms:  Conserva¬ 
tive  CACC  (C-CACC)  and  Split  Fast  Retransmit  CACC  (SFR- 
CACC).  The  C-CACC  algorithm  has  the  disadvantage  that  in 
the  face  of  loss,  a  significant  number  of  TSNs  could  poten¬ 
tially  wait  for  a  retransmission  timeout  when  they  could  have 
been  fast  retransmitted.  The  SFR-CACC  algorithm  alleviates 
this  disadvantage.  [5]  provides  the  details  of  the  CACC  algo¬ 
rithms,  including  verification  of  the  effectiveness  of  the  SFR- 
CACC  algorithm  through  ns-2  simulations. 


Results  from  [5]  suggest  that  the  problem  presented  in  Sec¬ 
tion  2  might  not  be  a  “corner  case.’’  By  approaching  the  prob¬ 
lem  from  different  perspectives,  the  Rhein  algorithm  and  the 
CACC  algorithms  all  solve  the  problem  of  cwnd  overgrowth. 
The  Rhein  algorithm  recognizes  that  this  growth  occurs  due  to 
the  sender’s  inability  to  distinguish  between  SACKs  for  orig¬ 
inal  transmissions  from  SACKs  for  retransmissions.  This  al¬ 
gorithm  does  not  solve  the  problem  of  unnecessary  fast  re¬ 
transmissions  on  a  changeover.  This  algorithm  also  adds  the 
overhead  of  an  extra  chunk  for  every  SCTP  packet. 

The  CACC  algorithms  maintain  state  information  during  a 
changeover,  and  use  this  information  to  avoid  incorrect  fast 
retransmissions.  These  algorithms  have  the  added  advantage 
that  no  extra  bits  are  added  to  any  packets,  and  thus  the  load 
on  the  wire  and  network  is  not  increased.  One  disadvantage 
of  the  CACC  algorithms  is  that  some  of  the  TSNs  on  the  old 
primary  are  ineligible  for  fast  retransmit.  Furthermore,  com¬ 
plexity  is  added  at  the  sender  to  maintain  and  use  the  added 
state  variables. 

We  have  implemented  SFR-CACC  in  the  NetBSD/FreeBSD 
release  for  the  KAME  stack  [1,  2],  The  implementation  uses 
three  flags  and  one  TSN  marker  for  each  destination,  as  de¬ 
scribed  in  [5],  Approximately  twenty  lines  of  C  code  were 
needed  to  facilitate  the  SFR-CACC  algorithm,  most  of  which 
will  be  executed  only  when  a  changeover  is  performed  in  an 
association. 

In  light  of  the  enveloping  issue  of  end-to-end  load  balancing, 
we  plan  to  research  the  following  questions  in  the  future: 

•  How  well  do  the  Rhein  and  CACC  algorithms  perform 
during  cycling  changeovers?  A  changeover  where  the 
sender  repeatedly  cycles  through  the  destination  address 
space  while  sending  data  is  called  a  cycling  changeover. 

•  In  an  FCS  network  wireless  environment  where  paths 
have  frequent  disruptions,  would  load  balancing  improve 
or  degrade  overall  performance? 

•  Given  the  path  parameters  for  all  paths  between  the 
sender  and  the  receiver,  is  it  possible  for  the  sender  to 
send  data  out  of  order  such  that  the  receiver  receives  all 
data  in  order?  Is  it  possible  for  the  sender  to  do  the  same 
while  probing  for  path  information  at  the  same  time? 
This  line  of  thought  leads  to  efficient  load  balancing  for 
realtime  and/or  multimedia  transfers. 

•  What  is  the  effect  of  performing  shared/separate  con¬ 
gestion  control  at  the  sender  among  various  paths  to 
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the  receiver  when  the  bottlenecks  along  the  paths  arc 
shared/separate?  This  question  arises  from  an  inquiry 
into  the  effects  of  shared/separate  bottlenecks  on  SCTP 
congestion  control.  This  study  is  important  for  SCTP  to 
be  TCP-friendly  in  sending  data. 

5  DISCLAIMER 

The  views  and  conclusions  contained  in  this  document  arc 
those  of  the  authors  and  should  not  be  interpreted  as  repre¬ 
senting  the  official  policies,  either  expressed  or  implied,  of  the 
Army  Research  Laboratory  or  the  U.S.  Government. 
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